Picture for Hao Shi

Hao Shi

Seeing Together: Multi-Robot Cooperative Egocentric Spatial Reasoning with Multimodal Large Language Models

Add code
May 19, 2026
Viaarxiv icon

EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras

Add code
May 12, 2026
Viaarxiv icon

E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes

Add code
Apr 06, 2026
Viaarxiv icon

Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization

Add code
Mar 13, 2026
Viaarxiv icon

Streaming Translation and Transcription Through Speech-to-Text Causal Alignment

Add code
Mar 12, 2026
Viaarxiv icon

O3N: Omnidirectional Open-Vocabulary Occupancy Prediction

Add code
Mar 12, 2026
Viaarxiv icon

OccTrack360: 4D Panoptic Occupancy Tracking from Surround-View Fisheye Cameras

Add code
Mar 09, 2026
Viaarxiv icon

RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design

Add code
Mar 01, 2026
Viaarxiv icon

Training-Free Intelligibility-Guided Observation Addition for Noisy ASR

Add code
Feb 24, 2026
Viaarxiv icon

ExoGS: A 4D Real-to-Sim-to-Real Framework for Scalable Manipulation Data Collection

Add code
Jan 26, 2026
Viaarxiv icon